Floating-Point Numbers with Error Estimates (revised)

نویسنده

  • Glauco Masotti
چکیده

The original work [25] is here reconsidered, so many years after. The old text has been revised, plus several considerations have been added, in order to clarify some controversial aspects of the work, and to envision possible developments. A section has been added, to review the effects of the original paper. The study addresses the problem of precision in floating-point (FP) computations. A method for estimating the errors which affect intermediate and final results is proposed and a summary of many software simulations is discussed. The basic idea consists of representing FP numbers by means of a data structure collecting value and estimated error information. Under certain constraints, the estimate of the absolute error is accurate and has a compact statistical distribution. By monitoring the estimated relative error during a computation (an ad-hoc definition of relative error has been used), the validity of results can be ensured. The error estimate enables the implementation of robust algorithms, and the detection of ill-conditioned problems. A dynamic extension of number precision, under the control of error estimates, is advocated, in order to compute results within given error bounds. A reduced time penalty could be achieved by a specialized FP processor. The realization of a hardwired processor incorporating the method, with current technology, should not be anymore a problem and would make the practical adoption of the method feasible for most applications. Index terms — floating-point computations, floating-point processor, floating-point errors, error estimation, numerical accuracy, ill-conditioned problems, computer arithmetic, dynamic precision extension.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Numerical Difficulties in Pre-University Informatics Education and Competitions

It is easy to underestimate the difficulties of using floating-point numbers in programming. This is especially the case in pre-university informatics education and competitions, where one is often led to believe that floating-point arithmetic is a good approximation of the real number system. However, most of the mathematical laws valid for real numbers break down when applied to floating-poin...

متن کامل

Relative error due to a single bit-flip in floating-point arithmetic

We consider the error due to a single bit-flip in a floating point number. We assume IEEE 754 double precision arithmetic, which encodes binary floating point numbers in a 64-bit word [1]. We assume that the bit-flip happens randomly so it has equi-probability (1/64) to hit any of the 64 bits. Since we want to mitigate the assumption on our initial floating-point number, we assume that it is un...

متن کامل

Accurate floating-point summation: a new approach

The aim of this paper is to find an accurate and efficient algorithm for evaluating the summation of large sets of floating-point numbers. We present a new representation of the floating-point number system in which a number is represented as a linear combination of integers and the coefficients are powers of the base of the floating-point system. The approach allows to build up an accurate flo...

متن کامل

A Parameterized Floating-Point Formalizaton in HOL Light

We present a new, open-source formalization of fixed and floating-point numbers for arbitrary radix and precision that is now part of the HOL Light distribution [10]. We prove correctness and error bounds for the four different rounding modes, and formalize a subset of the IEEE 754 [1] standard by gluing together a set of fixed-point and floating-point numbers to represent the subnormals and no...

متن کامل

Accurate Sum and Dot Product

Algorithms for summation and dot product of floating point numbers are presented which are fast in terms of measured computing time. We show that the computed results are as accurate as if computed in twice or K-fold working precision, K ≥ 3. For twice the working precision our algorithms for summation and dot product are some 40 % faster than the corresponding XBLAS routines while sharing simi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1201.5975  شماره 

صفحات  -

تاریخ انتشار 2012